-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(data): Refactor wordSearch
and other methods for clarity and maintainability
#1436
base: main
Are you sure you want to change the base?
refactor(data): Refactor wordSearch
and other methods for clarity and maintainability
#1436
Conversation
@melink14 Don't mind the extra commits at the end. I accidentally did a I did wan't to ask you about that. I have to update my Let me know what you think about the contributions and please let me know if there is anything that you think could be done differently, refactored, etc... ありがとうございます! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the new screenshots that were pushed it looks like there are some unexpected changes to how words are looked up (less words are found); we'll need to debug that.
Codecov Report
@@ Coverage Diff @@
## main #1436 +/- ##
==========================================
+ Coverage 79.62% 82.00% +2.37%
==========================================
Files 7 8 +1
Lines 3004 3189 +185
Branches 189 189
==========================================
+ Hits 2392 2615 +223
+ Misses 607 570 -37
+ Partials 5 4 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look at the tests yet due to time constraints but mostly nits with some high level questions on if the new code is functionally equivilant to the old code.
Also, it seems there are still some unexpected screenshot diffs! (title translations aren't continuing past the first word?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought: I wonder if we should just use wanakana library to do all this stuff: https://www.npmjs.com/package/wanakana
Seems to have what we need and is small.
edit: maybe doesn't handle half width...
I forgot to answer this question but this can probably be updated to fallback to an environment variable or command line option. I might also test if it's even required anymore (since there should be some logic to find chrome in a normal way) but I think I was debugging some problems in WSL which made me add it originally. |
So I went ahead and commented out the |
@melink14 all tests are passing now. Fixed a few logic issues. Going to do a bit more cleaning up of variable names and typing in the morning tomorrow. 🥂 |
I see the images are still there but that might be due to the presubmit not re-running to update them. (Since it doesn't re-run when github action pushes a commit) I'll close and reopen the pull request to get it to re-run. (I should add a comment or label command for it...) |
wordSearch
and other methods for clarity and maintainability
This hasn't been updated in years but looks like it does what we need it to do? |
Might be worth trying though if we found problems I guess we'd need to fork it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still haven't looked at the tests yet but I can feel the implementation is pretty close probably.
extension/data.ts
Outdated
for (let i = 0; i < str.length; i++) { | ||
const char = str.charAt(i); | ||
const nextChar = i + 1 <= str.length - 1 ? str.charAt(i + 1) : null; | ||
const nextCharCode = nextChar?.charCodeAt(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more clear if we store the characters as unicode as discussed in the other thread.
extension/data.ts
Outdated
0x307c, | ||
]; | ||
cs: number[] = [0x3071, 0x3074, 0x3077, 0x307a, 0x307d]; | ||
convertToHiragana(str: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have said this elsewhere but definititely recommend a jsdoc for this method and maybe a tweak to the name since it also has a truncation function now (which we added back for consistency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's at line 352; I left a comment there for easy reference.
The TS doc is solid but I'll leave some pedantic comments (mostly conforming to best practices developed in my day job but that have served me well)
currentCharCode <= KANA.HW_KATAKANA_END; | ||
let key = ''; | ||
if (currentCharCode < 0x3000) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the truncation I meant; with this line the string will be truncated at the first not kana/kanji character.
extension/data.ts
Outdated
0x307c, | ||
]; | ||
cs: number[] = [0x3071, 0x3074, 0x3077, 0x307a, 0x307d]; | ||
convertToHiragana(str: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's at line 352; I left a comment there for easy reference.
The TS doc is solid but I'll leave some pedantic comments (mostly conforming to best practices developed in my day job but that have served me well)
@@ -0,0 +1,186 @@ | |||
export const kanaToHiraganaNormalizationMap: Record<string, string> = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be constant case? (and readonly I guess)
); | ||
} | ||
/** | ||
* Returns the input string converted into hiragana. If any characters are not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's good to reference the inputs expliictly in the summary statement so that you don't need to use an @param
. The @
markers can be useful for additional information that doesn't flow naturally in the summary statement but there's no need to repeat information there that can be read once at the beginning!
So in this case something like:
Returns the given `kanaWord` with katakana and half width characters converted to full-width hiragana. ヴ is not converted. If a non-kana characters is found in `kanaWord`, that and the following characters are omitted from the returned conversion.
} | ||
/** | ||
* Returns the input string converted into hiragana. If any characters are not | ||
* found in the [NormalizationMap](./character_info.ts) then the character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information is all true but it's more of an implementation detail vs a spec. Ideally, the doc string should be about the goal of the method and needn't go into detail about how we accomplished that. Basically, anything that could be changed without updating a test, shouldn't go in the doc string.
Refactor
wordSearch
to use lookup objectAdded a lookup object to more cleanly convert from カタカナ to ひらがな.
Removed nested if/for block and replaced with a more concise
convertToHiragana
method.Added a
isKana
method to return if whether or not the current character is indeed 日本語.Renamed a few variables for code readability.
Testing